Quantifying the Java Development Kit

Here are some numbers of the Java Development Kit Runtime Archive rt.jar, which is a 60,4 MB big binary file.

First, we fire up the connection to Neo4j that contains all the data scanned by jQAssistant and check the version.


In [1]:
import py2neo
import pandas as pd
graph = py2neo.Graph()
graph.dbms.kernel_version


Out[1]:
(2, 3, 3)

I'm using here the following version of the Java SDK:


In [2]:
graph.data("MATCH (m:ManifestEntry {name:'Implementation-Version'}) RETURN m.value as Version LIMIT 25")


Out[2]:
[{'Version': '1.8.0_131'}]

Let's get some numbers!

Nodes

Number of all Nodes


In [3]:
graph.data("MATCH n RETURN COUNT(n) AS NumberOfAllNodes")


Out[3]:
[{'NumberOfAllNodes': 496456}]

Nodes and their Labels


In [4]:
pd.DataFrame(graph.data("MATCH n RETURN labels(n) AS Labels, COUNT(n) AS LabelCount ORDER BY LabelCount DESC"))


Out[4]:
LabelCount Labels
0 193682 [Java, Parameter]
1 168130 [Java, Member, Method]
2 85182 [Java, Field, Member]
3 23527 [Java, Member, Constructor, Method]
4 16800 [File, Type, Java, Class]
5 4374 [Value, Annotation]
6 2494 [File, Type, Java, Interface]
7 869 [Value, Primitive]
8 476 [Value, Enum]
9 324 [File, Type, Java, Enum]
10 273 [Value, Array]
11 148 [File, Type, Java, Annotation]
12 45 [Value, ManifestEntry]
13 42 [Value, Class]
14 38 [Java, ManifestSection]
15 34 [File, Type, Java]
16 15 [Concept]
17 1 [File, Java, Manifest]
18 1 [File, Artifact, Container, Archive, Zip, Java...
19 1 [File, Container, Directory]

Relationships

Number of all Relationships


In [5]:
graph.data("MATCH ()-[r]-() RETURN COUNT(r) AS NumberOfAllRelationships")


Out[5]:
[{'NumberOfAllRelationships': 4569425}]

Relationships and their Types


In [6]:
pd.DataFrame(graph.data("MATCH ()-[r]-() RETURN type(r) AS Type, COUNT(r) AS TypeCount ORDER BY TypeCount DESC"))


Out[6]:
Type TypeCount
0 INVOKES 1241913
1 READS 612246
2 DECLARES 591762
3 OF_TYPE 544822
4 DEPENDS_ON 465348
5 HAS 389478
6 RETURNS 336678
7 WRITES 207584
8 THROWS 70538
9 CONTAINS 40522
10 EXTENDS 39530
11 IMPLEMENTS 18840
12 ANNOTATED_BY 8712
13 IS 1036
14 HAS_DEFAULT 348
15 REQUIRES 68

Properties

Number of all properties


In [7]:
graph.data("MATCH n RETURN SUM(SIZE(KEYS(n))) as NumberOfAllProperties")


Out[7]:
[{'NumberOfAllProperties': 1629076}]

Amount of specific Properties


In [8]:
pd.DataFrame(graph.data("""
MATCH n WITH KEYS(n) as keys 
UNWIND keys as properties 
RETURN properties as Property, COUNT(properties) as PropertyCount
ORDER BY PropertyCount DESC"""))


Out[8]:
Property PropertyCount
0 signature 276839
1 name 275058
2 visibility 273437
3 index 193682
4 cyclomaticComplexity 174667
5 transient 79004
6 volatile 79004
7 static 65451
8 final 58568
9 fileName 19803
10 fqn 19800
11 valid 19767
12 md5 19766
13 byteCodeVersion 19766
14 sourceFileName 19749
15 abstract 18272
16 synthetic 13796
17 native 1718
18 value 914
19 id 15